High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications
نویسندگان
چکیده
Leveraging optimization techniques (e.g., register blocking and double buffering) introduced in the context of KBLAS, a Level 2 BLAS high performance library on GPUs, the authors implement dense matrix-vector multiplications within a sparse-block structure. While these optimizations are important for high performance dense kernel executions, they are even more critical when dealing with sparse linear algebra operations. The most time-consuming phase of many multicomponent applications, such as models of reacting flows or petroleum reservoirs, is the solution at each implicit time step of large, sparse spatially structured or unstructured linear systems. The standard method is a preconditioned Krylov solver. The Sparse Matrix-Vector multiplication (SpMV) is, in turn, one of the most time-consuming operations in such solvers. Because there is no data reuse of the elements of the matrix within a single SpMV, kernel performance is limited by the speed at which data can be transferred from memory to registers, making the bus bandwidth the major bottleneck. On the other hand, in case of a multi-species model, the resulting Jacobian has a dense block structure. For contemporary petroleum reservoir simulations, the block size typically ranges from three to a few dozen among different models, and still larger blocks are relevant within adaptively model-refined regions of the domain, though generally the size of the blocks, related to the number of conserved species, is constant over large regions within a given model. This structure can be exploited beyond the convenience of a block compressed row data format, because it offers opportunities to hide the data motion with useful computations. The new SpMV kernel outperforms existing state-of-the-art implementations on single and multi-GPUs using matrices with dense block structure representative of porous media applications with both structured and unstructured multi-component grids.
منابع مشابه
Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs
Simulations of many multi-component PDE-based applications, such as petroleum reservoirs or reacting flows, are dominated by the solution, on each time step and within each Newton step, of large sparse linear systems. The standard solver is a preconditioned Krylov method. Along with application of the preconditioner, memory-bound Sparse Matrix-Vector Multiplication (SpMV) is the most time-consu...
متن کاملOn the performance and energy efficiency of sparse linear algebra on GPUs
In this paper we unveil some performance and energy efficiency frontiers for sparse computations on GPU-based supercomputers. We compare the resource efficiency of different sparse matrix–vector products (SpMV) taken from libraries such as cuSPARSE and MAGMA for GPU and Intel’s MKL for multicore CPUs, and develop a GPU sparse matrix–matrix product (SpMM) implementation that handles the simultan...
متن کاملImplementing Sparse Matrix-Vector Multiplication with QCSR on GPU
We are going through the computation from single core to multicore architecture in parallel programming. Graphics Processor Units (GPUs) have recently emerged as outstanding platforms for data parallel applications with regular data access patterns. However, it is still challenging to optimize computations with irregular data access patterns like sparse matrix-vector multiplication (SPMV). SPMV...
متن کاملA Survey on Performance Modelling and Optimization Techniques for SpMV on GPUs
Sparse Matrix is a matrix consisting of very few non-zero entries. Large sparse matrices are often used in engineering and scientific operations. Especially sparse-matrix vector multiplication is an important operation for solving linear system and partial differential equations. However, there is a possibility that even though the matrix is partitioned and stored appropriately, the performance...
متن کاملSparse-matrix vector multiplication on hybrid CPU+GPU platform
Sparse-matrix vector multiplication(Spmv) is a basic operation in many linear algebra kernels.So it is interesting to have a spmv on modern architectures like GPU. As it is a irregular computation CPU also performs compares to GPU. So it is interesting to have this routine in hybrid architectures like CPU+GPU.So we have designed a hybrid algorithm for Spmv which uses a CPU and a GPU. We have ex...
متن کامل